NATURAL LANGUAGE PROGRAM ANALYSIS: COMBINING NATURAL LANGUAGE PROCESSING WITH PROGRAM ANALYSIS TO IMPROVE SOFTWARE MAINTENANCE TOOLS by

نویسنده

  • David Shepherd
چکیده

Because software systems are large and complex, developers often use software tools to understand unfamiliar code. In turn, software tools often utilize information about the program in the form of various program representations, which can provide detailed program information. Because traditional program representations do not capture the natural language clues in code, they often fail to assist the developer during high-level program understanding tasks. To bridge the gap between current software tools and the software developers’ high-level questions, we propose supplementing traditional program representations with a natural language representation that exploits the information embedded in the program’s names and comments. Any software tool that uses a program representation must automatically construct that representation. To automatically construct our natural language program representation, we combined natural language processing and traditional program analysis techniques. With these techniques, we extract the natural language clues from the method names, class names, and comments in a program. We evaluated the usefulness of our natural language program representation by developing two software tools which access our representation. The first, a software search tool called Find-Concept, locates code segments relevant to a developer’s query, a common first step in development tasks. In a user study, FindConcept found code segments more effectively and more consistently than a stateof-the-art information retrieval search tool and a lexical search tool. The second, an aspect mining tool called Timna, identifies code segments that could be more

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language-Based Software Analyses and Tools for Software Maintenance

Significant portions of software life cycle resources are devoted to program maintenance, which motivates the development of automated techniques and tools to support the tedious, error-prone tasks. Natural language clues from programmers’ naming in literals, identifiers, and comments can be leveraged to improve the effectiveness of many software tools. For example, they can be used to increase...

متن کامل

Parsing Formal Languages using Natural Language Parsing Techniques

Program analysis tools used in software maintenance must be robust and ought to be accurate. Many data-driven parsing approaches developed for natural languages are robust and have quite high accuracy when applied to parsing of software. We show this for the programming languages Java, C/C++, and Python. Further studies indicate that post-processing can almost completely remove the remaining er...

متن کامل

Natural Language in Software Engineering

The large time and effort devoted to software maintenance can be reduced by providing software engineers with software tools that automate tedious, error-prone tasks. However, despite the prevalence of tools such as IDEs, which automatically provide program information and automated support to the developer, there is considerable room for improvement in the existing software tools. The authors’...

متن کامل

Supporting Developers in Porting Software via Combined Textual and Structural Analysis of Software Artifacts

In the engineering and scientific domains software commonly has a long lifespan, lasting decades instead of years. Due to this lifespan, software often outlives the current generation of hardware, and in turn needs to be modified to execute on newer classes of hardware architectures [1]. Supporting developers in this difficult software maintenance activity is very important in order to improve ...

متن کامل

TR120625-42: Vocabulary Normalization’s Impact on IR-Based Concept Location

Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007